Clohui: an Efficient Algorithm for Mining Closed

نویسندگان

  • Shiming Guo
  • Hong Gao
چکیده

High-utility itemset mining (HUIM) is an important research topic in data mining field and extensive algorithms have been proposed. However, existing methods for HUIM present too many high-utility itemsets (HUIs), which reduces not only efficiency but also effectiveness of mining since users have to sift through a large number of HUIs to find useful ones. Recently a new representation, closed + high-utility itemset (CHUI), has been proposed. With this concept, the number of HUIs is reduced massively. Existing methods adopt two phases to discover CHUIs from a transaction database. In phase I, an itemset is first checked whether it is closed. If the itemset is closed, an overestimation technique is adopted to set an upper bound of the utility of this itemset in the database. The itemsets whose overestimated utilities are no less than a given threshold are selected as candidate CHUIs. In phase II, the candidate CHUIs generated from phase 1 are verified through computing their utilities in the database. However, there are two problems in these methods. 1) The number of candidate CHUIs is usually very huge and extensive memory is required. 2) The method computing closed itemsets is time consuming. Thus in this paper we propose an efficient algorithm CloHUI for mining CHUIs from a transaction database. CloHUI does not generate any candidate CHUIs during the mining process, and verifies closed itemsets from a tree structure. We propose a strategy to make the verifying process faster. Extensive experiments have been performed on sparse and dense datasets to compare CloHUI with the state-of-the-art algorithm CHUD, the experiment results show that for dense datasets our proposed algorithm CloHUI significantly outperforms CHUD: it is more than an order of magnitude faster, and consumes less memory.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Calculation of One-dimensional Forward Modelling of Helicopter-borne Electromagnetic Data and a Sensitivity Matrix Using Fast Hankel Transforms

The helicopter-borne electromagnetic (HEM) frequency-domain exploration method is an airborne electromagnetic (AEM) technique that is widely used for vast and rough areas for resistivity imaging. The vast amount of digitized data flowing from the HEM method requires an efficient and accurate inversion algorithm. Generally, the inverse modelling of HEM data in the first step requires a precise a...

متن کامل

Mining Closed Strong Association Rules by Rule-growth in Resource Effectiveness Matrix

Association rules mining approach can find the relationship among items. Using association rules mining algorithm to mine resource fault, can reduce the number of wrong alarm resources to be replaced. This paper proposed an efficient association rules mining algorithm: CSRule, for mining closed strong association rules based on association rule merging strategies. CSRule algorithm adopts severa...

متن کامل

A new approach based on data envelopment analysis with double frontiers for ranking the discovered rules from data mining

Data envelopment analysis (DEA) is a relatively new data oriented approach to evaluate performance of a set of peer entities called decision-making units (DMUs) that convert multiple inputs into multiple outputs. Within a relative limited period, DEA has been converted into a strong quantitative and analytical tool to measure and evaluate performance. In an article written by Toloo et al. (2009...

متن کامل

An efficient Imperialistic Competitive Algorithm for the closed-loop supply chains considering pricing for product, and fleet of heterogeneous vehicles

 This paper investigates a pricing multi-period, closed-loop supply chains (CLSCs) with two echelons of producers and customers. Products are delivered to customers might be defective which are picked up and gathered in the collection center and will be fixed if it is possible and will be returned to the chain. Otherwise, they are sold as waste. This problem is determining price and distributio...

متن کامل

Polynomial-Delay and Polynomial-Space Algorithms for Mining Closed Sequences, Graphs, and Pictures in Accessible Set Systems

In this paper, we study efficient closed pattern mining in a general framework of set systems, which are families of subsets ordered by set-inclusion with a certain structure, proposed by Boley, Horváth, Poigné, Wrobel (PKDD’07 and MLG’07). By modeling semi-structured data such as sequences, graphs, and pictures in a set system, we systematically study efficient mining of closed patterns. For a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016